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Abstract 

Background: Approximately one-third of those treated curatively for colorectal cancer (CRC) will experience recurrence. 
No evidence-based consensus exists on how best to follow patients after initial treatment to detect asymptomatic 
recurrence. Here, a new approach for simulating surveillance and recurrence among CRC survivors is outlined, and 
development and calibration of a simple model applying this approach is described. The model's ability to predict 
outcomes for a group of patients under a specified surveillance strategy is validated. 

Methods: We developed an individual-based simulation model consisting of two interacting submodels: a 
continuous-time disease-progression submodel overlain by a discrete-time Markov submodel of surveillance and 
re-treatment. In the former, some patients develops recurrent disease which probabilistically progresses from 
detectability to unresectability, and which may produce early symptoms leading to detection independent of 
surveillance testing. In the latter submodel, patients undergo user-specified surveillance testing regimens. Parameters 
describing disease progression were preliminarily estimated through calibration to match five-year disease-free survival, 
overall survival at years 1-5, and proportion of recurring patients undergoing curative salvage surgery from one arm of 
a published randomized trial. The calibrated model was validated by examining its ability to predict these same 
outcomes for patients in a different arm of the same trial undergoing less aggressive surveillance. 

Results: Calibrated parameter values were consistent with generally observed recurrence patterns. Sensitivity analysis 
suggested probability of curative salvage surgery was most influenced by sensitivity of carcinoembryonic antigen assay 
and of clinical interview/examination (i.e. scheduled provider visits). In validation, the model accurately predicted overall 
survival (59% predicted, 58% observed) and five-year disease-free survival (55% predicted, 53% observed), but was less 
accurate in predicting curative salvage surgery (10% predicted; 6% observed). 

Conclusions: Initial validation suggests the feasibility of this approach to modeling alternative surveillance regimens 
among CRC survivors. Further calibration to individual-level patient data could yield a model useful for predicting 
outcomes of specific surveillance strategies for risk-based subgroups or for individuals. This approach could be applied 
toward developing novel, tailored strategies for further clinical study. It has the potential to produce insights which will 
promote more effective surveillance — leading to higher cure rates for recurrent CRC. 
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Background 

Roughly two-thirds of the more than 140,000 Americans 
diagnosed with colorectal cancer (CRC) each year will be 
treated surgically with curative intent [1]. Approximately 
one-third of these will experience recurrence of the 
original disease or will develop a new primary (metachro- 
nous) CRC [2,3]- Median survival for those experiencing 
recurrence of original disease is around two years [4]. 
Ultimately, nearly 50,000 patients in the U.S. die each year 
from CRC [1]. 

Most patients treated curatively are placed on some type 
of surveillance program involving periodic follow-up test- 
ing to detect preclinical recurrence. For patients who will 
experience recurrence, prognosis, though generally poor, 
may be improved if detection occurs prior to symptom 
onset, particularly if surgical resection of metastatic dis- 
ease is possible [5]. Reviews of the relatively small number 
of trials comparing two or more specific follow-up strat- 
egies have suggested that more intensive strategies tend to 
increase survival at five years, detecting recurrence about 
six months earlier than less intensive strategies [6-8] at a 
point where salvage surgery with curative intent is more 
likely to occur (10.7% vs. 5.7%; p = .0002) [7]. However, 
the strategies tested, the populations studied, and the 
study periods varied significantly between trials [6-8]. As 
such, drawing prescriptive conclusions regarding best 
practices on the basis of these data is difficult. 

Meanwhile, in economic terms, CRC surveillance 
consumes significant resources. U.S. investigators found 
that five-year follow up can cost as much as US$16,942 per 
patient [9]. Another group in Europe reported surveillance 
costs of US$9,011 [10] to US$59,841 [11] per detection of a 
recurrence resulting in attempted curative salvage surgery. 

Not surprisingly, consensus is lacking among expert 
panels on how best to follow these patients [12-18]. New 
surveillance trials are in progress, but results may be several 
years away [19-21]. When complete, these trials will pro- 
vide valuable information but will have been able to exam- 
ine only a small fraction of possible surveillance strategies. 

Given the potential impact on quantity and quality of 
life and on health care costs, better tools are needed for 
informing decisions around postsurgical surveillance of 
colorectal cancer patients. By simulating the dynamics of 
recurrence in a population of patients, a realistic computer 
simulation model could function as a virtual laboratory 
within which an unlimited number of experiments com- 
paring hypothetical surveillance strategies could be run in 
silico within a compressed timeframe. 

Several models have incorporated dynamics of the 
adenoma-carcinoma sequence in order to compare the ef- 
fectiveness of hypothetical screening strategies in patients 
without history of CRC [22-32]; these include three 
models used in the National Cancer Institutes compara- 
tive modeling effort CISNET (Cancer Intervention and 



Surveillance Modeling Network) [33]. Fewer models 
though have simulated the events following diagnosis and 
treatment of CRC in order to compare postsurgical 
surveillance strategies [34-37]. None has captured the 
dynamics of recurrence in a way that accounts for disease 
progression during diagnostic delay and that considers the 
full range of possible metastatic sites. 

Capturing the dynamics of CRC recurrence is a major 
methodological challenge mainly because of the difficulty 
of estimating parameters describing progression of recur- 
ring disease amid the censoring caused by medical and 
surgical interventions. In order to create a realistic model 
which allows assessment of any hypothetical surveillance 
strategy, one must be able to account for disease progres- 
sion amid diagnostic delay. Here, we describe a new 
approach to modeling the interaction between natural 
history of CRC recurrence and early detection of recur- 
rence through surveillance testing— an approach designed 
to allow the simulation of any potential surveillance strat- 
egy. We introduce a basic model we have developed which 
applies this approach, preliminarily estimate disease pro- 
gression parameters by calibration based on published 
outcomes from a classic surveillance trial, and offer a 
quantitative validation of the model. 

Methods 

Overview of approach 

The model itself is comprised of two interacting submo- 
dels: a continuous-time disease progression submodel and 
a discrete time Markov submodel of surveillance testing 
and re-treatment. In the disease progression submodel, 
the exact time to earliest recurrence detectability is 
pre-determined for each simulated patient who will recur 
based on random draws from an exponential probability 
distribution. A pair of formulas— both functions of the 
time to earliest detectability-determines the timing of the 
transition to unresectability and to the point of symptom 
onset. Once these pre-determinations are made, the 
discrete-time Markov surveillance and re-treatment sub- 
model simulates scheduled visits for surveillance testing of 
asymptomatic patient as frequently as every three months. 
This submodel references the pre-determined timeline of 
disease progression to determine whether asymptomatic 
recurrences are detectable by testing during surveillance 
visits, and whether recurrences are considered potentially 
resectable versus unresectable at the time they are discov- 
ered. Recurrences may alternatively be detected in the 
interval between surveillance visits as a result of symp- 
toms which prompt individuals to seek earlier care. 

To simulate the impact of any combination and sched- 
ule of surveillance tests, the disease progression submodel 
must be capable— in the extreme— of simulating disease 
progression in the absence of surveillance. Since most data 
describing recurrence tends to be "contaminated" by the 
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effect of testing to detect asymptomatic recurrence and by 
subsequent intervention, it is difficult to directly estimate 
certain key parameters underlying a disease progression 
submodel Therefore, we use a calibration approach to 
estimate these parameters while using the surveillance and 
re-treatment submodel described above to control for the 
effect of a known surveillance regimen on observed dis- 
ease progression. Once these disease progression parame- 
ters are estimated through calibration, the schedule of 
follow-up tests embodied in the surveillance and re- 
treatment submodel can be changed to simulate any num- 
ber of hypothetical follow-up regimens. In the case of the 
current work, we use the surveillance regimen followed in 
the intervention arm of a published CRC follow-up trial to 
calibrate disease progression parameters. We then change 
the surveillance regimen to be consistent with the control 
arm of the same study and compare the predicted to the 
observed outcomes. 

Modeling disease progression 

Each individual i is randomly assigned a future recur- 
rence status (i.e. will recur at some point or not) and site 
based on random number draws from a uniform prob- 
ability distribution with range 0 to 1. Anatomic sites of 
recurrence are described in terms of local tumor status 
(none, locoregional, anastomotic only, or intraluminal/ 
metachronous) and metastasis status (none, liver, lung, 
or multiple/other organs). All recurrences are considered 
to "begin" at the time of initial treatment since, by defin- 
ition, there would have been at least some small amount 
of cancerous tissue which remained despite treatment. 
Each individual who will recur is assigned a future time Di 
which represents the earliest point in time at which the 
recurrence of patient i will be detectable through surveil- 
lance testing. Di is assigned randomly to these individuals 
from a continuous exponential probability distribution 
with hazard rate r d , leading to a declining incidence of 
recurrence with increasing time from initial treatment at 
the population level [38]. The time Ui at which the case is 
no longer amenable to curative salvage surgery (point of 
unresectability) is defined as 

U i = D i + X du *(l + r u ) Dl 

Where x du is the baseline (i.e. where Di = 0) interval 
between D t and Ui and r u is the rate of change in this 
window. A positive value of X du would represent the 
minimum possible width of the window between earliest 
detectability and unresectability given r u > 0. Beyond the 
point of unresectability, the patient will only be eligible 
for palliation and will be assigned a life expectancy 
accordingly. 

A non-trivial number of recurrences are discovered 
through work-ups precipitated by patient symptomatic 



presentations in the intervals between scheduled surveil- 
lance visits [39]. This proportion can be expected to 
grow in inverse proportion to the vigilance of follow-up 
regimens, and should thus be modeled. The point of 
recurrence symptom onset Si for patient i is defined as 

Si = Di + X ds *(l + r s ) Di +8 ds 

Where x ds is the baseline (D { = 0) interval between Di 
and Si and r s is the rate of change in this window. A posi- 
tive value of X ds would represent the minimum width of 
the window between earliest detectability and symptom 
onset given r s > 0. The normally distributed error term e ds 
(with mean = 0 and standard deviation = o* ds ) is included 
to ensure sufficient variation in symptom onset so that 
some patients will develop symptoms before the point of 
unresectability and some will develop symptoms after that 
point. Figure 1 illustrates the two possible scenarios for 
progression of recurring individuals through clinically 
important points in disease natural history. 

Modeling surveillance and re-treatment 

We used TreeAge Pro 2012 (Williamstown, MA) to create 
an individual-based Markov submodel of surveillance 
testing and re-treatment. Individual-based Markov models 

( N 



Time from initial curative surgery 




Time from initial curative surgery 



Figure 1 Modeling disease progression in recurrent colorectal 
cancer. Among the subset of individuals who will experience 
recurrence, cancerous tissue is considered to be present but 
undetectable until time D,. At time, S i; the patient will develop 
clinical symptoms of recurrence which will bring them to seek 
medical attention and will lead to diagnosis of recurrence if not 
previously detected. At time U,, the recurrent cancer will become 
"unresectable": no longer amenable to curative treatment. U, may be 
greater-than (Scenario 1) or less-than (Scenario 2) S,. In a scenario of 
no follow-up for detection of asymptomatic recurrence, the only 
patients for whom curative treatment of recurrence would be possible 
are those with S, < U, (Scenario 1) since symptoms would have brought 
them to seek medical attention at a point where curative re- 
treatment could occur. Continuous, heterogeneous values of D j; 
Sj, and Uj are assigned to each patient who will recur based on 
the disease progression submodel as described in the text. 

V ) 
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allow simulation of phenomena which involve transitions 
of heterogeneous individuals between states over time. 
During each discrete three-month time increment — or 
cycle— each of a series of hypothetical individuals may be 
in only one state at a time. Within a state, individuals may 
experience a series of probabilistic events which determine 
whether they will remain in their current state during the 
next cycle or transition to a new state. In our model, a 
series of simulated patients starts each cycle in one of five 
states: no known recurrence, recurrence curatively treated, 
recurrence palliatively treated, deceased due to cancer, or 
deceased due to other causes (See state transition diagram 
in Figure 2). Three months is used as a cycle length since 
this represents the shortest recommended inter-visit inter- 
val among published recommendations for post-surgical 
surveillance in CRC patients [12,15,16,40]. Note that only 
surveillance testing of asymptomatic patients occurs based 
on discrete time steps. Natural history events are mea- 
sured in continuous time so that simulated patients will 
reach the point of detectability (DJ, the point of unresect- 
ability (Ui), and the point of symptom onset (Si) during 
the intervals between surveillance visits. 

The time horizon of five years for surveillance was 
chosen because this is the timeframe over which virtually 
all recurrences manifest [7], and the majority of surveil- 
lance recommendations focus on this five-year horizon 
[12,15,16,40]. All individuals begin the first cycle— repre- 
senting the period immediately following curative resec- 
tion — in the no known recurrence state. 

Each of the living states in the surveillance and re- 
treatment submodel is represented by a decision subtree 
(see Additional file 1: Figure SI). Whether or not a test 
is performed during a particular three-month cycle is 
dictated by a surveillance testing schedule matrix with a 
row for each three month cycle and a column for each 
possible testing modality. Using Is and Os, this matrix 
can specify any combination and timing of laboratory 
tests (e.g. carcinoembryonic antigen, or CEA), imaging 
studies (e.g. abdominal/pelvic CT), and provider visits. 
Positivity or negativity of each test given during a cycle 
depends on the current stage of the model compared to 
Di, the location of any detectable tumors, and the sensi- 
tivity and specificity of the test being employed. If any of 
the tests performed in a given cycle is positive, it is 
assumed that sufficient further workup using other avail- 
able tests is undertaken to correctly identify detectable 
recurrence or otherwise rule out recurrence within that 
cycle with a combined sensitivity and specificity of 100%. 
Once recurrence is diagnosed (whether based on detec- 
tion through scheduled surveillance or stemming from 
symptoms developing between scheduled visits), patients 
are assigned a life expectancy. The exact value depends 
on whether their recurrence has been diagnosed before 
or after reaching the point of unresectability (Table 1). 




Time from initial curative surgery 



Figure 2 Possible state transitions in the surveillance and 

re-treatment submodel. An individual-based Markov submodel 

simulates the follow-up testing which may lead to detection of 

asymptomatic recurrence, and the treatment which may occur after 

recurrence diagnosis. The arrows represent possible state transitions 

which could occur at each three-month time step. Each patient 

begins the model in the "No known recurrence" state wherein they 

undergo surveillance testing according to a sepcified schedule. If a 

true recurrence is discovered, the patient will proceed to one of the 

two states: "Recurrence curatively treated" or "Recurrence palliatively 

treated", depending on whether the current model cycle occurs 

before or after U, (U, is assigned as a continuous value based on the 

disease progression submodel). Patients in the "Recurrence curatively 

treated" state continue to undergo surveillance testing after treatment. 

During each cycle, individuals may move to the "Dead due to other 

causes" state from any other living state based on a background 

transition probability of mortality from other causes. Additional file 1: 

Figure SI provides a more detailed depiction of the contingencies which 

drive state transitions in the surveillance and re-treatment submodel, 
k J 

Parameters 

Calibration to estimate disease progression parameters 

Values for the six disease progression parameters described 
above (r d , x du , r u , x ds , r s , and a ds ) are impossible or imprac- 
tical to observe, and their values can be expected to differ 
based on disease and patient characteristics of the groups 
of interest. Therefore, we used a calibration process to esti- 
mate values for these parameters. We systematically ran 
the model with many thousands of different combinations 
of the unknown parameters in order to find the combin- 
ation which yielded model outcomes most consistent with 
a real world data source [54], As a data source, we used 
published data from a classic trial of an intensive versus 
a minimal follow-up strategy for patients who had 
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Table 1 Estimates for non-calibrated parameters 



Variable 



Input 
parameter 
estimate 



Range for 
sensitivity 
analysis 



Source 3 



Screening test performance 

Carcinoembryonic Antigen test (CEA) 

Sensitivity .64 

Specificity .90 
Chest X ray 

Sensitivity .76 

Specificity .95 
CT - Hepatic metastases 

Sensitivity .83 

Specificity .93 
CT - Other abdominal metastases 

Sensitivity .46 

Specificity .98 
Hepatic ultrasound 

Sensitivity .62 

Specificity .85 
Colonoscopy 

Sensitivity .95 

Specificity 1 .00 
Clinical in terview/examination b 

Sensitivity .42 

Specificity .95 
Life expectancy 

After initial surgery, 



.49- 79 [41] 

.75 -1.00 [41] 

.61 - .91 [39,42,43] 

.80 - 1 .00 [43] 

.68 - .98 [44-47] 

.78 - 1 .00 [44,47] 



given no recurrence 

After recurrence with 
curative salvage 

After diagnosis of 
unresectable recurrence 



20.7 years 



21 months 



8 months 



.31 - .61 
.83 - 1 .00 

.47 - .77 
.70 - 1 .00 

.80 - 1 .00 
.85 - 1 .00 

.27 - .57 
.70 - 1 .00 



15 - 27 
4-12 



[44] 
[44] 

[47-49] 
[47-49] 

[23,50] 
[23,50] 

[5] 
[5] 

[51] 
[52] d 
[53] e 



a When multiple sources are given, the parameter used in the model was 
estimated based on a sample-size-weighted mean. 
b ln the absence of laboratory or other diagnostic findings 
suggesting recurrence. 

c Since the time horizon for the model was 5 years, varying this parameter over 
all but the most extreme low bounds would have no effect on results. 
d Estimate based on incomplete reporting of survival among subjects 
undergoing curative salvage surgery in the Pietra study. 
e Based on reported survival of 13 months following recurrence of colon cancer 
during the era of 1986-1992, and assumption that 1/3 of those relapsing will 
undergo curative salvage surgery with survival of 21 months as described 
above. The result is that those recurring but not undergoing curative salvage 
surgery would be expected to survive approximately 8 months after diagnosis 
of recurrence. 

undergone curative resection for CRC from 1987-1990 at 
an Italian Center [52]. Among all available trials of surveil- 
lance strategies, this study by Pietra et al offered the most 
complete data needed for the exercise. Patients in the trial 
had been treated surgically for Astler-Coller Stage B or C 
CRC without adjuvant chemotherapy (additional details 
describing study participants in Table 2). The focus of the 



trial was on local recurrences; however, data for subjects 
with metastatic recurrences was also reported. We used 
data from the intensive follow-up arm to calibrate our 
model; accordingly, we refer to this group as the calibra- 
tion group. To account for risk of non-cancer death among 
individuals who have not been diagnosed with recurrence, 
we calibrated an additional parameter (m) representing 
group-specific background (i.e. from non-CRC causes) 
mortality probability per cycle. We considered this a more 
reliable method to estimate background mortality in a 
small group compared to using life table-based estimates. 

We selected a set of seven observable outcomes 
reported by Pietra et al. to serve as calibration targets 
(note that these are distinct from the seven parameters 
estimated through calibration). The discrepancies between 
observed and model-predicted values of these targets were 
compared for each combination of values of the disease 
progression and background mortality parameters. The 
targets used were disease-free survival at five years (DFS5), 
overall survival at years one through five (OS1 - OS5), 
and the proportion of patients undergoing eventual 
curative salvage surgery following recurrence. Group 
characteristics and recurrence rates, surveillance testing 
schedule, and observed values for each of the calibration 
targets are shown for the intensive (calibration) group 
in the middle column of Table 2. 

Searching the parameter space was achieved using a 
series of increasingly narrow grid searches. Specifically, each 
of the seven parameters to be calibrated was given an 
initially broad range (shown in the third column of Table 3) 
which was divided into three to eight evenly-spaced inter- 
vals. Values from across the initial range were systematically 
sampled so that all possible combinations of parameters 
were tried (i.e. a grid search of the parameter space [54]). 
Each run of the model simulated a cohort of 10,000 
patients. Model-generated values of each calibration target 
were compared to values reported by Pietra et al. Candidate 
disease progression parameter sets which yielded target 
values > +/- one percentage point (or > +/- two percentage 
points if no values varying less than one percentage point 
were found) from the observed target value for DFS5 were 
eliminated from consideration. Next, model-generated and 
observed values for OS5 were compared in this same man- 
ner to further reduce the set of candidate parameter values, 
followed by OS4, OS3, proportion of patients undergoing 
curative salvage surgery, OS2, and finally OS1. OS2 and 
OS1 were considered last since these were most subject to 
influence from small numbers of events. The result was a 
small group of remaining parameter sets. Based on the 
range of candidate parameter values suggested by this 
process, a second round of calibration was performed, this 
time with a narrower set of candidate parameter values. 
This process was repeated for subsequent rounds until no 
additional gains in model fit— as judged by changes in the 
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Table 2 Comparison of calibration group and validation group based on intensive and minimal follow-up groups, 
respectively, from trial by Pietra et al. [52] 



Value 



Calibration group 
(Intensive follow-up) 



Validation group 
(Minimal follow-up) 



Group characteristics 

Number of subjects 
Male 

Mean age at diagnosis 

Astler-Coller stage B/C 

Primary colon/rectal tumors 

Preoperative complications 

Recurrence rate during study period 

Distribution of metastatic disease if present: 
Liver/Lung/Other (includes multiple organs) 6 

Surveillance testing schedule 

Clinical interview/exam 

CEA 

Hepatic ultrasound 

CT of abdomen/pelvis 
Chest x-ray 
Colonoscopy 
Outcome targets 

Disease-free survival at 5 years (DFS5) 

Overall survival at one year after initial surgery (0S1) 

0S2 

0S3 

0S4 

0S5 

Proportion undergoing curative salvage re-operation for recurrence 



104 
56% 
62.2 +/- 1 1 
59.6%/40.4% 
70.2%/29.8% 
14% 
39.4% 
26.7%/0.0%/73.3% 



Every 3 months for 2 years; 
every 6 months thereafter 

Every 3 months for 2 years; 
every 6 months thereafter 

Every 3 months for 2 years; 
every 6 months thereafter 

Every 12 months 

Every 12 months 

Every 12 months 



97% 
90% 
84% 
76% 
73% 
20% 



103 
51% 
64.4 +/- 12 
58.3%/41.7% 
64.1%/35.9% 
11% 
40.4% a 
14.2%/4.8%/81.0% 



Every 6 months for 1 year; 
every 1 2 months thereafter 

Every 6 months for 1 year; 
every 1 2 months thereafter 

Every 6 months for 1 year; 
every 1 2 months thereafter 

None 

Every 12 months 

Every 12 months 



53% 
98% 
89% 
74% 
65% 
58% 



CEA = serum carcinoembryonic antigen assay; OSx = overall survival at x years, 
includes a single metachronous (new primary) CRC 

Metastatic site was only reported by Pietra et al. for patients whose recurrences were metastatic only, and did not feature simultaneous involvement around the 
original tumor site. 



value of a goodness -of-fit statistic— were achieved. The 
goodness-of-fit statistic used for this purpose was a simple 
unweighted sum of squared differences between observed 
and predicted values for the calibration targets. The lowest 
goodness-of-fit value from among all sets of parameters 
from one round of calibration was compared to the lowest 
value from the previous round. Once no further improve- 
ment was seen between rounds, no further rounds of cali- 
bration were undertaken. After the final calibration round, 
the remaining parameter sets collectively contained a small 
range of candidate values for each parameter. A single best 
set was chosen which contained the values closest to the 
midpoint of the final range for each parameter (in the same 
order of priority which was applied throughout calibration: 
DFS5, OS5, OS4, OS3, probability of curative salvage 



operation, OS2, then OS1). This approach of using mul- 
tiple, increasingly narrow grid searches was preferred on 
efficiency grounds to an approach of performing a single 
grid search with a large number of closely-spaced intervals 
for each parameter. With the latter approach, the number 
of parameters combinations would have been astronomical. 
Ultimately, we performed nine successive rounds of calibra- 
tion, examining a total of 276,960 distinct parameter com- 
binations (10,000 patients simulated for each parameter 
combination) in order to arrive at the final parameter 
estimates. 

Non-calibrated parameters 

Test sensitivity and specificity, as well as life expectan- 
cies, were estimated based on values from published 
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Table 3 Starting ranges used in first round of model calibration - ranges were narrowed with successive rounds of 
calibration to achieve improved fit to the outcomes reported in the intensive follow-up arm of the Pietra trial [52] 
Parameter Definition Starting range (inclusive) Final calibrated parameter value 

r d Rate per 3-month cycle at which recurrences transition from 0.05 - 0.12 0.092 
undetectable to detectable 

x ds Baseline (D, = 0) presymptomatic window 1 week - 9 months 1 7 weeks 

x du Baseline (D, = 0) window of resectability 1 week - 9 months 6 weeks 

r s Increment in presymptomatic window for each additional unit of D, -0.025 - 0.15 0.07 

(expressed as a rate per 3-month cycle) 

r u Increment in window of resectability for each additional unit of D, 0 - 0.15 0.1 1 

(expressed as a rate per 3-month cycle) 

a ds Standard deviation of normally-distributed error term 1 week - 6 months 1 1 weeks 

for presymptomatic window 

m Background five-year cumulative probability of mortality 1% - 20% 1.6% a 

from non-CRC causes 

a By comparison, five year cumulative probability of death from any cause in the Italian population for a group with the age and gender makeup of the calibration 
group has been estimated at 6.1% for 1990 [51]. 



studies (Table 1). Since evidence-based estimates were 
obtainable, calibration of these parameters was not 
necessary. Sensitivity and specificity values shown are for 
individual tests in isolation. It was assumed that metas- 
tases to multiple locations or to single sites besides the 
liver or lungs would be inoperable. It was also assumed 
that lung metastases were 50% as likely as liver metasta- 
ses to be operable [55-57]. 

Sensitivity analysis 

One-way sensitivity analyses were performed on all non- 
calibrated model parameters according to the ranges 
shown in Table 1. The primary outcomes of interest in 
sensitivity analyses were OS5 and proportion of subjects 
undergoing curative salvage surgery for recurrence of 
CRC. The same randomization algorithm seed value was 
used throughout sensitivity analysis to minimize the 
impact of stochastic variability on comparisons with 
baseline model outputs. 

Validation 

Once values for the six parameters describing the progres- 
sion of CRC recurrence and the parameter representing 
background mortality for those without known recurrence 
had been estimated by calibration, the model was used to 
predict the outcomes that would be expected in the min- 
imal follow-up arm of the Pietra trial; we will refer to this 
arm as the validation group. The primary purpose of this 
validation was to assess the models ability to predict 
cancer-related outcomes for a group with known disease 
characteristics and non-cancer mortality risk under a spe- 
cified surveillance regimen. Keeping all calibrated disease 
progression parameters constant, the follow-up regimen 
modeled was modified to be consistent with that under- 
gone by subjects in the validation group (see Table 2). Be- 
cause the Pietra trial was not sufficiently large that subject 



characteristics were evenly distributed between groups 
through randomization, the proportion of patients relaps- 
ing and the distribution of metastatic sites in the model 
(see Table 2) were adjusted in validation to match what 
was reported for the minimal follow-up group. Specific- 
ally, probability of recurrence was adjusted from 39.4% to 
40.4%, and distribution of metastatic disease between 
liver/lung/other was adjusted from 26.7%/0.0%/73.3% to 
14.2%/4.8%/81.0%. Pietra et al. reported a substantial dif- 
ference in DFS5 between the intensive and minimal sur- 
veillance groups (68% and 53% respectively). Only 1.0 
percentage point of this difference was explained by a dif- 
ference in recurrence rate. Since eventual cancer-related 
deaths do not affect disease-free survival calculation (ra- 
ther, it is the diagnosis of recurrence which precedes death 
that counts), it was assumed that the remaining difference 
in DFS5 was attributable to differences in background 
mortality. We accordingly adjusted the five-year back- 
ground cumulative probability of death from other causes 
in the validation group by 14.0%, equating to a 3.4 per- 
centage point increment in annual mortality from other 
causes in the validation group compared to the calibration 
group. No adjustments were made to the structure of the 
model itself. 

Results 

Model calibration 

Final calibrated parameter estimates are shown in the 
fourth column of Table 3. Figure 3 compares observed 
values from the intensive follow-up (calibration) group 
in the Pietra trial to the model-generated outputs for the 
same group using the final best-fitting parameter set. As 
seen from the figure, predicted overall survival is slightly 
overestimated early in the five-year time horizon, while 
five-year disease-free survival and the rate of curative 
salvage surgery are slightly underestimated by the model. 
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Figure 3 Observed outcomes in intensive follow-up 
(calibration) group compared to outputs of fitted model. 

DFS5 = disease-free survival at 5 years. 



Additional file 2: Figure S2 depicts the disease progres- 
sion milestones for 20 simulated patients with recur- 
rence based on the calibrated parameters. 

Sensitivity analysis 

In order to examine the sensitivity of key model results to 
changes in non-calibrated parameters, we conducted a 
deterministic sensitivity analysis, varying all non-calibrated 
parameters in the fitted model across the ranges shown in 
the third column of Table 1. The outcomes of interest for 
sensitivity analysis were the proportion of patients under- 
going eventual salvage surgery (denominator = all patients 
initially treated for CRC, whether or not they eventually 
recurred) and OS5. 

Figure 4a depicts sensitivity analysis results for the out- 
come representing the proportion of patients undergoing 
curative salvage surgery. The most influential parameter 
was test sensitivity of the CEA assay, with a 15 percentage 
point increase in sensitivity to 79% resulting in a 7.9% in- 
crease in proportion undergoing curative salvage surgery, 
and a 15 percentage point decrease to 49% resulting in a 
7.4% decrease in proportion undergoing curative salvage 
surgery. Varying all other parameters over the specified 
ranges yielded variation in curative salvage surgeries of 
within 5% of the base case. 

Figure 4b shows sensitivity analysis results for the 
outcome of overall survival at 5 years. Life expectancy 
values had the greatest impact on this outcome, though 
no parameter proved highly influential. 

Validation 

In order to assess the predictive value of the model for a 
similar population under a different follow-up regimen, 
we ran the model for the minimal follow-up arm of the 
Pietra trial and compared model-predicted with ob- 
served outcomes. This comparison is shown in Figure 5. 
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Figure 4 One-way sensitivity analysis results, a) When varied over 
the ranges indicated at either end of each horizontal bar, these 
parameters had the greatest impact on the predicted proportion of 
total patients undergoing curative salvage surgery for CRC recurrence 
within five years of initial treatment, b) Parameters with the greatest 
impact on predicted overall survival at five years (OS5). 



The largest difference between observed and predicted 
outcomes for the validation group was in the proportion 
of subjects who would eventually undergo curative sal- 
vage surgery for recurrence. The model overestimated 
this proportion at 10%, while only 6% of patients in real- 
ity underwent curative salvage surgery in the validation 
group. 

Discussion 

We have introduced a new approach for simulating 
colorectal cancer surveillance and recurrence and have de- 
scribed a model which applies this approach. We estimated 
a best-fitting disease progression parameter set for this 
model by comparing iteratively generated model outputs 
to observed outcomes in one arm of a classic randomized 
controlled trial comparing two follow-up regimens (cali- 
bration). We then validated the model by demonstrating 
that it closely approximated several key outcomes for a 
separate group of patients followed less intensively in the 
same trial (validation). Application of this approach in 
subsequent based on larger, individual-level data sets could 
facilitate more personalized post-surgical surveillance of 
CRC survivors, ultimately translating into fewer low-yield 
tests and more lives saved through early detection of 
recurrence. 

The approach of developing a generic disease progres- 
sion submodel and calibrating its parameters based on 
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Figure 5 Predicted vs. observed outcomes for the validation 
(minimal follow-up) group. DFS5 = disease-free survival at 5 years. 



outcomes data was taken because of the lack of biological 
data needed to directly estimate these parameters. The 
natural history behavior of recurring colorectal cancer is 
difficult to observe directly and is extremely heteroge- 
neous. This heterogeneity likely stems, at least partly, from 
the greater genotypic diversity of genomically unstable 
tumor tissue which has been exposed to the stresses of 
treatment. 

Though the specific values for the parameter estimates 
pertain to the patients in the Pietra trial and should not be 
construed to broadly represent disease behavior in the 
context of more recent treatment norms, they are consist- 
ent with generally-recognized patterns of CRC recurrence. 
Specifically, the estimates imply that those who recur later 
after initial treatment (higher values of D { ) tend to exhibit 
a more indolent course of disease (as represented by a 
positive value of r^s) and are more likely to be considered 
curable when diagnosed (as represented by a positive value 
of r du ) [53,58-60]. 

In sensitivity analysis, the proportion of patients under- 
going curative salvage surgery was not highly sensitive to 
any single parameter. This is partly explained by the fact 
that approximately six of ten individuals did not recur; 
including these individuals in the denominator somewhat 
attenuates the variation in this outcome. However, even if 
only patients who recurred are included in the denomin- 
ator, no parameter influenced the outcome by more than 
20%. An interesting pattern did emerge. Probability of 
salvage surgery was most directly influenced by changes in 
sensitivity of tests which were not specific to an anatomic 
site (i.e. CEA and clinical exam/interview). In the simplest 
terms, these "tests" cast a wide net, with the potential to 
generate a true positive for any recurrence, regardless of 
site. This observation reinforces the rationale behind the 
current pursuit of newer, highly sensitive circulating 
markers for recurrence. 



Paradoxically, proportion of curative salvage surgeries 
moved in inverse proportion to the specificity of those 
tests which targeted an anatomic site (e.g. hepatic ultra- 
sound). This finding suggests that false positives for 
such tests actually provide an indirect benefit by pre- 
cipitating further workups which diagnose true recur- 
rences in other locations. This paradox underscores the 
importance of considering quality-of-life impact of test- 
ing as well as cost considerations. With these issues in 
mind, our model has been designed so that future ver- 
sions may be used to compare the costs per quality- 
adjusted life year of alternative surveillance regimens. 
Specifically, the surveillance and re-treatment submodel 
has the ability to track costs due to surveillance testing, 
work-up of positive surveillance tests, and subsequent 
curative or palliative treatment for each simulated indi- 
vidual at each point in time. Quality-adjustment of life 
expectancy will be achieved by applying health state 
utility parameters to assigned life expectancies. 

The fact that overall survival of CRC survivors in gen- 
eral is only modestly affected by changes in many of the 
parameters associated with diagnostic test characteristics 
and differential survival suggests an important implication. 
Because most survivors will not recur, and many of those 
who do recur have a narrow window within which early 
detection may change prognosis, developing tools to bet- 
ter identify those at greatest risk and to individualize their 
surveillance will likely yield greater benefit than applying 
generalized surveillance recommendations to all survivors. 
With calibration to larger, individual-based data sets 
covering patients with a range of risk factor profiles, it is 
hoped that future iterations of the model will provide a 
means to compare the effectiveness of proposed surveil- 
lance regimens for risk-based subgroups of patients and to 
even suggest optimized regimens for individuals. 

Multiple models have been developed to evaluate 
CRC screening strategies in healthy populations. These 
models simulate the sequence by which benign polyps 
transform to adenocarcinomas, and by which these adeno- 
carcinomas grow and invade healthy tissue [22-32]. Some 
of these screening models have spawned research ques- 
tions involving disease natural history [28,61], have in- 
formed development of U.S. Preventive Services Task 
Force (USPSTF) guidelines on colorectal cancer screening 
[62], and have been applied by the Centers for Medicare 
and Medicaid Services (CMS) to compare the effectiveness 
of CRC screening strategies [63,64] . 

Only a few models have addressed the issue of surveil- 
lance of curatively treated CRC patients for early detection 
of recurrence. Two of these have not accounted for disease 
progression during diagnostic delay in a manner that would 
allow realistic assessment of novel surveillance strategies 
[34,35]. A separate group of investigators [36] used a so- 
phisticated mechanistic, organ-level simulation to compare 
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different treatment and surveillance strategies in patients 
with hepatic metastases. While the model does account for 
disease progression amid diagnostic delay, it does not pro- 
vide information useful in determining optimal follow-up 
for non-hepatic recurrences. Also, the approach taken may 
be less feasible than a more population-based approach 
when considering metastases to other sites about which less 
detailed data are available. 

A final, recent example of a model examining CRC sur- 
veillance is provided by Erenay et al. [37]. These investiga- 
tors used longitudinal patient data to estimate natural 
history parameters related to formation and progression 
of metachronous colorectal tumors in patients who were 
status post curative resection. Though well-done, this 
model does not address recurrence of original cancers 
(which represent over 90% of relapses [7]). Furthermore, it 
is not validated against any other data source. 

The approach we have described uses calibration of a 
disease progression submodel interacting with a surveil- 
lance and re-treatment submodel to estimate disease pro- 
gression parameters while controlling for the impact of 
surveillance on disease natural history. The effect is to 
estimate group-specific parameters for a model describing 
disease progression in the absence of surveillance. Once 
calibrated, novel combinations and schedules of surveil- 
lance tests can be coupled with the disease progression 
model to predict outcomes for a group with similar under- 
lying characteristics. Further calibration of the model to 
patient-level outcomes data containing individual risk 
factors will enable prediction of outcomes for populations 
or individuals with specific risk profiles, paving the way 
for more personalized and effective surveillance and the 
possibility for earlier detection of recurrence at a stage 
where cure is more likely. 

Limitations 

The chief limitation of this work stems from the small 
size of the patient groups used in calibration and valid- 
ation, and from the lack of individual-level data. With 
larger groups, the probability that underlying disease 
behavior and background mortality would be similar be- 
tween populations used for calibration and populations 
for whom outcome predictions were being sought would 
be greater. We chose the Pietra trial because of the 
wealth of outcomes data it provided and the between- 
study-arm similarity in subject characteristics compared 
to other available published studies. In validation, we 
attempted to alleviate limitations related to group size 
by assigning the appropriate background mortality rates 
and distribution of metastatic sites to match those re- 
ported for the validation group. In a scenario of making 
actual population-level predictions, one might not have 
the luxury of knowing these characteristics of the group 
of interest. However, for our purposes, these adjustments 



can be seen as strengthening the internal validation of the 
model by controlling for known differences between 
groups. In the future, individual-level data describing more 
diverse groups will be used to provide more generalizable 
parameter estimates and a higher-resolution understand- 
ing of the effects of different risk factors. With larger sam- 
ple sizes, far less variability can be assumed in metastasis 
pattern and background mortality between calibration 
populations and populations for which predictions are to 
be made. 

A minor limitation relates to the fact that we did not ex- 
plicitly model the occurrence of second primary (metachro- 
nous) CRCs. Because no subjects in the calibration group 
were diagnosed with metachronous CRC, calibrating the 
model to simultaneously simulate incidence of metachro- 
nous CRC and recurrence of previously-treated disease was 
not a feasible option. Fortunately, the impact of this limita- 
tion on the integrity of the model validation should be min- 
imal since only one individual in the validation group was 
reported to have experienced metachronous CRC. Because 
metachronous CRC (normally representing between 1.6% 
[7] and 7.4% [7,37,65] of CRC recurrences) behaves as a 
primary cancer and is associated with higher probability of 
cure compared to recurrent disease, it will be important to 
explicitly model separate natural history processes for these 
two types of events in future iterations of the model. 

We do not consider the fact that the study is based on 
data gathered in the 1980s and 1990s to be a limitation. 
The purpose of this work was not to make prescriptive 
assertions about survivorship care of CRC patients today, 
but rather to demonstrate a method and preliminarily 
validate it using publicly-available data. That said, some 
useful generalizations are possible regarding the impact of 
diagnostic test characteristics and the importance of devel- 
oping the ability to better customize follow-up of CRC 
survivors as elaborated above. 

Conclusion 

Scheduled post-operative surveillance of patients cura- 
tively treated for colorectal cancer allows the detection of 
asymptomatic recurrence which can create the possibility 
of cure for some patients. We have described and prelim- 
inarily validated a simulation model, based on a novel 
approach, which allows comparison of important patient 
outcomes under hypothetical combinations and schedules 
of common surveillance tests. 

Future work 

Next steps include adapting the model to a discrete 
event simulation framework which will allow calibration 
to individual-level patient data in which subject risk fac- 
tors as well as precise dates of testing events and clinical 
milestones are known. Calibration and validation with 
larger, individual-level data sets representing patients 
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with a more diverse range of demographic and disease 
characteristics is planned. Use of richer data sets (e.g. 
from combined clinical trials, or based on cancer registry 
data) will enable the use of future versions of the model 
to compare the effectiveness and cost-effectiveness of 
published surveillance recommendations for subgroups 
based on cancer stage, location (e.g. rectum versus 
colon), age, comorbidity status, presence of molecular 
markers, adjuvant treatment received, and other consid- 
erations. In addition, the model could be adapted to 
utilize an optimization algorithm to generate candidate 
follow-up strategies for such subgroups— allowing the 
more efficient design of meaningful clinical trials. Future 
adaptations could also power interactive clinical decision 
aids for planning the management of individual patients, 
an approach that itself could be validated in a random- 
ized trial. 
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